Author Name Disambiguation for Citations Using Topic and Web Correlation

نویسندگان

  • Kai-Hsiang Yang
  • Hsin-Tsung Peng
  • Jian-Yi Jiang
  • Hahn-Ming Lee
  • Jan-Ming Ho
چکیده

Today, bibliographic digital libraries play an important role in helping members of academic community search for novel research. In particular, author disambiguation for citations is a major problem during the data integration and cleaning process, since author names are usually very ambiguous. For solving this problem, we proposed two kinds of correlations between citations, namely, Topic Correlation and Web Correlation, to exploit relationships between citations, in order to identify whether two citations with the same author name refer to the same individual. The topic correlation measures the similarity between research topics of two citations; while the Web correlation measures the number of co-occurrence in web pages. We employ a pair-wise grouping algorithm to group citations into clusters. The results of experiments show that the disambiguation accuracy has great improvement when using topic correlation and Web correlation, and Web correlation provides stronger evidences about the authors of citations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهبود صحت ابهام‌زدایی نام نویسنده با استفاده از خوشه‌بندی تجمّعی

Today, digital libraries are important academic resources including millions of citations and bibliographic essential information such as titles, author's names and location of publications. From the view of knowledge accumulation management, the ability to search fast, accurate, desired contents, has a great importance. The complexity and similarity in these resources cause many challenges and...

متن کامل

Extracting Citation Relationships from Web Documents for Author Disambiguation

Disambiguating the citation records of authors with the same name is a very interesting and challenging problem that affects many research and application fields, such as digital libraries. However, current bibliographic digital libraries like CiteSeer can not correctly disambiguate citation records because of two problems: information sparsity (citations for an individual have few or no common...

متن کامل

On co-authorship for author disambiguation

Author name disambiguation deals with clustering the same-name authors into different individuals. To attack the problem, many studies have employed a variety of disambiguation features such as coauthors, titles of papers/publications, topics of articles, emails/affiliations, etc. Among these, co-authorship is the most easily accessible and influential, since inter-person acquaintances represen...

متن کامل

Using the semantic web for author disambiguation - are we there yet?

The quality, and therefore, the usability and reliability of data in digital libraries depends on author disambiguation, i.e., the correct assignment of publications to a particular person. Author disambiguation aims to resolve name ambiguity, i.e., synonyms (the same author publishing under different names), and polysemes (different authors with the same name), and assign publications to the c...

متن کامل

A Model-based K-means Algorithm for Name Disambiguation

Unambiguous identities of resources are important aspect for semantic web. This paper addresses the personal identity issue in the context of bibliographies. Because of abbreviations or misspelling of names in publications or bibliographies, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of identity matching, document ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008